-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41481: [CI] Update how extra environment variables are specified for the integration test docker job #42009
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, can we prefer -e
specified by command line to default values in docker-compose.yml
instead of this approach? Something like:
diff --git a/dev/archery/archery/docker/core.py b/dev/archery/archery/docker/core.py
index cb83106002..16e164562d 100644
--- a/dev/archery/archery/docker/core.py
+++ b/dev/archery/archery/docker/core.py
@@ -340,18 +340,8 @@ class DockerCompose(Command):
service = self.config.get(service_name)
args = []
- if user is not None:
- args.extend(['-u', user])
-
- if env is not None:
- for k, v in env.items():
- args.extend(['-e', '{}={}'.format(k, v)])
-
- if volumes is not None:
- for volume in volumes:
- args.extend(['--volume', volume])
-
- if self.config.using_docker or service['need_gpu'] or resource_limit:
+ use_docker = self.config.using_docker or service['need_gpu'] or resource_limit
+ if use_docker:
# use gpus, requires docker>=19.03
if service['need_gpu']:
args.extend(['--gpus', 'all'])
@@ -396,6 +386,18 @@ class DockerCompose(Command):
# name which we refer as image in general
args.append(service['image'])
+ if user is not None:
+ args.extend(['-u', user])
+
+ if env is not None:
+ for k, v in env.items():
+ args.extend(['-e', '{}={}'.format(k, v)])
+
+ if volumes is not None:
+ for volume in volumes:
+ args.extend(['--volume', volume])
+
+ if use_docker:
# add command from compose if it wasn't overridden
if command is not None:
args.append(command)
@kou Thank you for taking a look and apologies for taking a few days to circle back!
That is certainly what I would have expected; however, I am not sure that I feel confident modifying this part of the code. In any case, I think the change here that enables these specific jobs to run is a good one (everything pertaining to whether the tests do or do not run is controlled in the workflow file). |
fad0312
to
7dbef9d
Compare
ci/scripts/nanoarrow_build.sh
Outdated
echo "The nanoarrow source is missing. Please clone the arrow-nanoarrow repository" | ||
echo "to arrow/nanoarrow before running the integration tests:" | ||
echo " git clone https://github.com/apache/arrow-nanoarrow.git path/to/arrow/nanoarrow" | ||
echo "Not building nanoarrow because '${source_dir}' does not exist" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fond of this, because it implies that a typo when invoking this script, or a missing checkout, will go unnoticed.
ci/scripts/nanoarrow_build.sh
Outdated
@@ -27,18 +27,21 @@ build_dir=${2}/nanoarrow | |||
# integration tests. Testing of the nanoarrow implementation in normal CI is handled | |||
# by github workflows in the arrow-nanoarrow repository. | |||
|
|||
# Include in the integration test by default if the source directory is present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the flag is still off by default in Archery:
arrow/dev/archery/archery/cli.py
Lines 741 to 743 in 680980e
@click.option('--with-nanoarrow', type=bool, default=False, | |
help='Include nanoarrow in integration tests', | |
envvar="ARCHERY_INTEGRATION_WITH_NANOARROW") |
docker-compose.yml
Outdated
@@ -1748,8 +1748,6 @@ services: | |||
volumes: *conda-volumes | |||
environment: | |||
<<: [*common, *ccache] | |||
ARCHERY_INTEGRATION_WITH_NANOARROW: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. Should we mark it =1 here and let the script take the passed value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great suggestion! I think if that happens, then the script will error if somebody doesn't have a nanoarrow checkout (and so a local run of archery docker run ...
will fail perhaps unexpectedly).
@vibhatha @pitrou Thank you for taking a look! I think the right thing to do is probably to run the integration tests from the nanoarrow repo where there are fewer arrow-related constraints (e.g., copy what's done in the arrow-rs repo). The right thing to do here is to apply @kou's suggestion to fix the root cause (unexpected behaviour of |
Ah, sorry. My patch was wrong. How about this? diff --git a/dev/archery/archery/docker/core.py b/dev/archery/archery/docker/core.py
index ec422e9aaf..5be4887ea4 100644
--- a/dev/archery/archery/docker/core.py
+++ b/dev/archery/archery/docker/core.py
@@ -383,10 +383,6 @@ class DockerCompose(Command):
args.append(f'--memory={memory}')
args.append(f'--memory-swap={memory}')
- # get the actual docker image name instead of the compose service
- # name which we refer as image in general
- args.append(service['image'])
-
if user is not None:
args.extend(['-u', user])
@@ -399,6 +395,10 @@ class DockerCompose(Command):
args.extend(['--volume', volume])
if use_docker:
+ # get the actual docker image name instead of the compose service
+ # name which we refer as image in general
+ args.append(service['image'])
+
# add command from compose if it wasn't overridden
if command is not None:
args.append(command) |
1b938b8
to
932ccdc
Compare
Co-authored-by: Sutou Kouhei <[email protected]>
932ccdc
to
74b2f96
Compare
@kou This seems to be working...is there any additional changes needed to this approach? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 21238a7. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 14 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Currently, nanoarrow and Rust are not being included in the integration tests. The command issued by archery includes multiple environment variable definitions and the rightmost ones disable the extra environment variables.
https://github.com/apache/arrow/actions/runs/9397807525/job/25881776553#step:9:353
What changes are included in this PR?
This PR updates how environment variables are specified such that the intended value is passed to the docker build.
Are these changes tested?
Yes
Are there any user-facing changes?
No